Search CORE

12 research outputs found

Language support for dynamic, hierarchical data partitioning

Author: Alex Aiken
Bienia C.
Michael Bauer
Sean Treichler
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Exascale Deep Learning for Climate Analytics

Author: Deslippe Jack
Fatica Massimiliano
Houston Michael
Kurth Thorsten
Luehr Nathan
Mahesh Ankur
Matheson Michael
Mudigonda Mayur
Phillips Everett
Prabhat
Romero Joshua
Treichler Sean
Publication venue
Publication date: 03/10/2018
Field of study

We extract pixel-level masks of extreme weather patterns using variants of Tiramisu and DeepLabv3+ neural networks. We describe improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November 11-16, 2018, Dallas, TX, US

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Structure Slicing: Extending Logical Regions with Fields

Author: Alex Aiken
Elliott Slaughter
Michael Bauer
Sean Treichler
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/12/2014
Field of study

Abstract—Applications on modern supercomputers are in-creasingly limited by the cost of data movement, but mainstream programming systems have few abstractions for describing the structure of a program’s data. Consequently, the burden of managing data movement, placement, and layout currently falls primarily upon the programmer. To address this problem we previously proposed a data model based on logical regions and described Legion, a programming system incorporating logical regions. In this paper, we present structure slicing, which incorporates fields into the logical region data model. We show that structure slicing enables Legion to automatically infer task parallelism from field non-interference, decouple the specification of data usage from layout, and reduce the overall amount of data moved. We demonstrate that structure slicing enables both strong and weak scaling of three Legion applications including S3D, a production combustion simulation that uses logical regions with thousands of fields, with speedups of up to 3.68X over a vectorized CPU-only Fortran implementation and 1.88X over an independently hand-tuned OpenACC code. I

CiteSeerX

Crossref

Language Support for Dynamic, Hierarchical Data Partitioning

Author: Alex Aiken
Michael Bauer
Sean Treichler
Publication venue
Publication date: 01/01/2013
Field of study

Applications written for distributed-memory parallel architectures must partition their data to enable parallel execution. As memory hierarchies become deeper, it is increasingly necessary that the data partitioning also be hierarchical to match. Current language proposals perform this hierarchical partitioning statically, which excludes many important applications where the appropriate partitioning is itself data dependent and so must be computed dynamically. We describe Legion, a region-based programming system, where each region may be partitioned into subregions. Partitions are computed dynamically and are fully programmable. The division of data need not be disjoint and subregions of a region may overlap, or alias one another. Computations use regions with certain privileges (e.g., expressing that a computatio

CiteSeerX

Singe

Author: Alex Aiken
Michael Bauer
Sean Treichler
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Legion: Expressing Locality and Independence with Logical Regions

Author: Alex Aiken
Elliott Slaughter
Michael Bauer
Sean Treichler
Publication venue
Publication date
Field of study

Abstract—Modern parallel architectures have both heterogeneous processors and deep, complex memory hierarchies. We present Legion, a programming model and runtime system for achieving high performance on these machines. Legion is organized around logical regions, which express both locality and independence of program data, and tasks, functions that perform computations on regions. We describe a runtime system that dynamically extracts parallelism from Legion programs, using a distributed, parallel scheduling algorithm that identifies both independent tasks and nested parallelism. Legion also enables explicit, programmer controlled movement of data through the memory hierarchy and placement of tasks based on locality information via a novel mapping interface. We evaluate our Legion implementation on three applications: fluid-flow on a regular grid, a three-level AMR code solving a heat diffusion equation, and a circuit simulation. I

CiteSeerX

Crossref